Goto

Collaborating Authors

 call sequence


Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling

Maekawa, Seiji, Hassell, Jackson, Pezeshkpour, Pouya, Mitchell, Tom, Hruschka, Estevam

arXiv.org Artificial Intelligence

As language models gain access to external tools via structured function calls, they become increasingly more capable of solving complex, multi-step tasks. However, existing benchmarks for tool-augmented language models (TaLMs) provide insufficient control over factors such as the number of functions accessible, task complexity, and input size, and remain vulnerable to data contamination. We present FuncBenchGen, a unified, contamination-free framework that evaluates TaLMs by generating synthetic multi-step tool-use tasks. The key idea is to cast tool use as traversal over a hidden function-dependency DAG where nodes are function calls and an edge between nodes represents one function consuming the output of another. Given a set of external function schemas, initial variable values, and a target variable, models must compose the correct call sequence to compute the target variable. FuncBenchGen allows users to precisely control task difficulty (e.g., graph size, dependency depth, and distractor functions) while avoiding data leakage. We apply our FuncBenchGen framework to evaluate seven LLMs on tool use tasks of varying difficulty. Reasoning-optimized models consistently outperform general-purpose models with GPT-5 significantly outperforming other models. Performance declines sharply as dependency depth increases. Furthermore, connected irrelevant functions prove especially difficult to handle. We find that strong models often make syntactically valid function calls but propagate incorrect or stale argument values across steps, revealing brittle state tracking by LLMs in multi-turn tool use. Motivated by this observation, we introduce a simple mitigation strategy that explicitly restates prior variable values to the agent at each step. Surprisingly, this lightweight change yields substantial gains across models. e.g., yielding a success rate improvement from 62.5% to 81.3% for GPT-5.


Leveraging LSTM and GAN for Modern Malware Detection

Gupta, Ishita, Kumari, Sneha, Jha, Priya, Ghosh, Mohona

arXiv.org Artificial Intelligence

The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.


STraceBERT: Source Code Retrieval using Semantic Application Traces

Spiess, Claudio

arXiv.org Artificial Intelligence

Software reverse engineering is an essential task in software engineering and security, but it can be a challenging process, especially for adversarial artifacts. To address this challenge, we present STraceBERT, a novel approach that utilizes a Java dynamic analysis tool to record calls to core Java libraries, and pretrain a BERT-style model on the recorded application traces for effective method source code retrieval from a candidate set. Our experiments demonstrate the effectiveness of STraceBERT in retrieving the source code compared to existing approaches. Our proposed approach offers a promising solution to the problem of code retrieval in software reverse engineering and opens up new avenues for further research in this area.


GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences

Tony, Catherine, Ferreyra, Nicolás E. Díaz, Scandariato, Riccardo

arXiv.org Artificial Intelligence

GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.


Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

Hsiao, S. W., Chu, P. Y.

arXiv.org Artificial Intelligence

Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance.


Everyone Knows that Everyone Knows: Gossip Protocols for Super Experts

van Ditmarsch, Hans, Gattinger, Malvin, Ramezanian, Rahim

arXiv.org Artificial Intelligence

A gossip protocol is a procedure for sharing secrets in a network. The basic action in a gossip protocol is a telephone call wherein the calling agents exchange all the secrets they know. An agent who knows all secrets is an expert. The usual termination condition is that all agents are experts. Instead, we explore protocols wherein the termination condition is that all agents know that all agents are experts. We call such agents super experts. Additionally, we model that agents who are super experts do not make and do not answer calls. Such agents are called engaged agents. We also model that such gossip protocols are common knowledge among the agents. We investigate conditions under which protocols terminate, both in the synchronous case, where there is a global clock, and in the asynchronous case, where there is not. We show that a commonly known protocol with engaged agents may terminate faster than the same protocol without engaged agents.


Open Problems in a Logic of Gossips

Apt, Krzysztof R., Wojtczak, Dominik

arXiv.org Artificial Intelligence

Gossip protocols are programs used in a setting in which each agent holds a secret and the aim is to reach a situation in which all agents know all secrets. Such protocols rely on a point-to-point or group communication. Distributed epistemic gossip protocols use epistemic formulas in the component programs for the agents. The advantage of the use of epistemic logic is that the resulting protocols are very concise and amenable for a simple verification. Recently, we introduced a natural modal logic that allows one to express distributed epistemic gossip protocols and to reason about their correctness. We proved that the resulting protocols are implementable and that all aspects of their correctness, including termination, are decidable. To establish these results we showed that both the definition of semantics and of truth of the underlying logic are decidable. We also showed that the analogous results hold for an extension of this logic with the 'common knowledge' operator. However, several, often deceptively simple, questions about this logic and the corresponding gossip protocols remain open. The purpose of this paper is to list and elucidate these questions and provide for them an appropriate background information in the form of partial of related results.


Verification of Distributed Epistemic Gossip Protocols

Apt, Krzysztof R., Wojtczak, Dominik

Journal of Artificial Intelligence Research

Gossip protocols aim at arriving, by means of point-to-point or group communications, at a situation in which all the agents know each other secrets. Distributed epistemic gossip protocols use as guards formulas from a simple epistemic logic and as statements calls between the agents. They are natural examples of knowledge based programs. We prove here that these protocols are implementable, that their partial correctness is decidable and that termination and two forms of fair termination of these protocols are decidable, as well. To establish these results we show that the definition of semantics and of truth of the underlying logic are decidable.


Researchers reveal the first 'primate linguistics' monkey guide

Daily Mail - Science & tech

Linguists and primatologists have joined forces to create the groundwork for'primate linguistics,' helping to decipher the meanings behind monkey speech. The comprehensive study examines the calls of different species, analyzing the structure and placement of these vocalizations, and explains what individual calls and sequences mean. While the language of primates may not be as complex as our own, researchers say these animals demonstrate linguistic capabilities that are both'exciting and sometimes challenging.' Linguists and primatologists have joined forces to create the groundwork for'primate linguistics,' helping to decipher the meanings behind monkey speech. The study, led by an international team of researchers, was published recently in the journals Natural Language & linguistic Theory, and builds on earlier research.